feat(packaging): one-command upgrade — auto-migrate the DB safely on dnf/apt update#569
Merged
Merged
Conversation
`dnf update openwatch*` (or apt) now applies pending DB migrations automatically, with a restore point and a fail-safe service state — for the single-instance appliance model. The operator runs one command. Mechanism: - internal/dbbackup: builds a pg_dump command that takes connection params (esp. the password) via PG* env, NEVER argv (no ps leak). - `openwatch migrate` gains --status (report pending without applying) and --backup-dir (pg_dump restore point BEFORE Apply; skipped on a fresh DB; FAILS CLOSED — never migrates if the backup fails). - packaging/common/openwatch-upgrade.sh (RPM %post when $1>=2 / DEB postinst when old-version $2 is set — upgrade only, never fresh install): stop -> openwatch migrate --backup-dir -> start on success; on failure leave the service STOPPED + print the restore path + exit non-zero so the package manager surfaces it. Migrations are transactional, so a failure rolls back atomically (data intact). - Backup retention: openwatch-backup-cleanup.timer (daily) prunes dumps past BACKUP_RETENTION_DAYS but always keeps the most recent. Operator-tunable via /etc/openwatch/upgrade.conf (AUTO_BACKUP, etc.). Deliberately OUT of scope (documented): PostgreSQL ENGINE major-version upgrades (operator-supervised pg_upgrade, never from a scriptlet), and multi-instance/zero-downtime upgrades. Spec release-upgrade (C-01..05 / AC-01..07); docs/runbooks/UPGRADING.md. Verified end to end: migrate --status + --backup-dir produce a real pg_dump then migrate against a live DB; both packages ship + render the upgrade-only guard.
Proves the real upgrade path end to end, beyond the source-inspection + scriptlet-logic tests: install the OLD openwatch RPM (release 1), stand up Postgres, roll the schema back one migration to simulate the prior version, then `rpm -U` the NEW RPM (release 2) and assert the package's %post scriptlet migrated the DB to head (34 -> 35, host_connection_profile created), took a pre-upgrade backup, and issued the service stop/start. - packaging/tests/upgrade-container-test.sh runs inside rockylinux:9 (a systemctl shim records stop/start since the container has no systemd). - packaging/tests/run-upgrade-container-test.sh is the one-command host driver: builds the two RPM releases and runs the container. - openwatch-upgrade.sh: OPENWATCH_UPGRADE_CONF / OPENWATCH_SECRETS_ENV overrides (default to the production /etc paths) so the test can point config + secrets at a scratch fixture. No production behavior change. Verified locally: RESULT PASS (34 -> 35, backup taken, stop+start issued).
The two scripts were silently caught by .gitignore's blanket `*test*.sh` and dropped from the previous commit. Add a `!packaging/tests/*test*.sh` exception so committed test-harness scripts are tracked, and add the container upgrade test + its host driver.
Adds an `upgrade` job to package-smoke that exercises the full package UPGRADE path the per-distro `smoke` (fresh install) job can't: build an old (release 1) + new (release 2) RPM, then in a rockylinux:9 container install the old, stand up Postgres, roll the schema back one migration, and `rpm -U` the new — asserting the %post scriptlet migrates the DB to head, takes a pre-upgrade backup, and stop/starts the service. Runs the committed packaging/tests/run-upgrade-container-test.sh driver. Fires on the same packaging-change / tag / dispatch triggers as the rest of the workflow, so it runs on this PR.
The command is the constant pg_dump, args are package-controlled flags + an output path (never user input), and connection params travel via env not argv. Annotate both exec sites with // #nosec G204 (the repo's convention) so make lint passes.
remyluslosius
added a commit
that referenced
this pull request
Jun 16, 2026
- Un-ignore SESSION_LOG.md (.gitignore listed it next to the already-tracked BACKLOG.md; both are the session-continuity docs CLAUDE.md/BACKLOG reference for provenance) and add it with the 2026-06-16 handoff: SSH full-matrix + per-host learning (#566), packaging fresh-install + auto-upgrade (#564/#569), CI gate speedup (#567), settings/cleanup (#561/#562/#563/#568), and the Dependabot triage (9 merged / 6 skipped), plus next-steps + gotchas. - BACKLOG: drop the completed PKG-1/PKG-2 (shipped in #564); add the SSH learning follow-up (wire connprofile into discovery/intelligence/ liveness) and a "Deferred Dependency Upgrades" section (MUI 7→9, eslint 10 blocked-upstream, cosign-installer v4 signing migration). Bump date.
remyluslosius
added a commit
that referenced
this pull request
Jun 16, 2026
- Un-ignore SESSION_LOG.md (.gitignore listed it next to the already-tracked BACKLOG.md; both are the session-continuity docs CLAUDE.md/BACKLOG reference for provenance) and add it with the 2026-06-16 handoff: SSH full-matrix + per-host learning (#566), packaging fresh-install + auto-upgrade (#564/#569), CI gate speedup (#567), settings/cleanup (#561/#562/#563/#568), and the Dependabot triage (9 merged / 6 skipped), plus next-steps + gotchas. - BACKLOG: drop the completed PKG-1/PKG-2 (shipped in #564); add the SSH learning follow-up (wire connprofile into discovery/intelligence/ liveness) and a "Deferred Dependency Upgrades" section (MUI 7→9, eslint 10 blocked-upstream, cosign-installer v4 signing migration). Bump date.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
sudo dnf update -y openwatch*(or apt) should be all an operator does — the package handles the app + DB-schema upgrade automatically, with a backup, without breaking the app or losing data. This implements exactly that for the single-instance appliance model.What happens on upgrade (automatic)
The RPM
%post(when$1 -ge 2) / DEBpostinst(when the old-version$2is set) — upgrade only, never a fresh install — runsopenwatch-upgrade.sh:openwatch migrate --backup-dir→pg_dumpa restore point, then apply. Each migration is transactional (atomic rollback on failure).Safety properties
internal/dbbackuppasses connection params topg_dumpviaPG*env only (unit-pinned, AC-01).Operator surface
openwatch migrate --status— see pending migrations before upgrading./etc/openwatch/upgrade.conf—AUTO_BACKUP,BACKUP_DIR,BACKUP_RETENTION_DAYS(config-noreplace).openwatch-backup-cleanup.timer(daily prune), enabled on install/upgrade.docs/runbooks/UPGRADING.md— the one-command flow, recovery, restore-from-backup.Deliberately out of scope (documented)
pg_upgrade, never silently from a scriptlet (it can lose the whole DB). Minor/patch Postgres comes via dnf/apt dependencies.Verification
/var/lib/openwatch/backupsdir, andupgrade.conf(config).migrate --statusreports correctly;migrate --backup-dirproduced a valid 82 KBpg_dumpthen migrated; DSN redacted in logs.gofmt/go vet/go buildclean;specter check107 specs;dbbackup+release-upgradeAC-01..07 tests pass.Spec: new
release-upgrade(C-01..05 / AC-01..07).